Experimental Estimation of Number of Clusters Based on Cluster Quality
نویسندگان
چکیده
Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.
منابع مشابه
Target Detection Improvements in Hyperspectral Images by Adjusting Band Weights and Identifying end-members in Feature Space Clusters
Spectral target detection could be regarded as one of the strategic applications of hyperspectral data analysis. The presence of targets in an area smaller than a pixel’s ground coverage has led to the development of spectral un-mixing methods to detect these types of targets. Usually, in the spectral un-mixing algorithms, the similar weights have been assumed for spectral bands. Howe...
متن کاملApplication of a Self-Organizing Map for Clustering the Groundwater Quality in Kerman Province and Assessment its Suitability for Drinking and Irrigation Purposes
Evaluation of groundwater hydro chemical characteristics is necessary for planning and water resources management in terms of quality. In the present study, a self-organizing map (SOM) clustering technique was used to recognize the homogeneous clusters of hydro chemical parameters in water resources (including well, spring and qanat) of Kerman province; then, the quality classification of groun...
متن کاملEstimation of genetic diversity in rice (Oryza sativa L.) genotypes using SSR markers under salinity stress . Fatemeh Gholizadeh1* and Saeed Navabpour2
In order to study the genetic diversity in rice (Oryza sativa L.), 29 genotypes consisting land races, pure and improved lines were evaluated using simple sequence repeat (SSR) markers. A total of 30 SSR primers were used to amplify some part of rice genome in germplasms, the PIC values ranged from 0.07 (RM 340) to 0.71 (RM 7426) with an average of 0.45. The results showed a total number of 106...
متن کاملSimulation of Fabrication toward High Quality Thin Films for Robotic Applications by Ionized Cluster Beam Deposition
The most commonly used method for the production of thin films is based on deposition of atoms or molecules onto a solid surface. One of the suitable method is to produce high quality metallic, semiconductor and organic thin film is Ionized cluster beam deposition (ICBD), which are used in electronic, robotic, optical, optoelectronic devices. Many important factors such as cluster size, cluster...
متن کاملKD-Tree Based Clustering for Gene Expression Data
K-means is one of the widely researched clustering algorithms. But it is sensitive to the selection of initial cluster centers and estimation of the number of clusters. In this chapter, we propose a novel approach to find the efficient initial cluster centers using kd-tree and compute the number of clusters using joint distance function. We have carried out excessive experiments on various synt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1503.03168 شماره
صفحات -
تاریخ انتشار 2014